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Abstract 

f) In a recent issue of this journal, Mordukhovich et al. pose and solve 

^^ an interesting non-differentiable generalization of the Heron problem 

in the framework of modern convex analysis. In the generalized Heron 
problem one is given fc + 1 closed convex sets in M"^ equipped with its 
Euclidean norm and asked to find the point in the last set such that 
the sum of the distances to the first k sets is minimal. In later work the 
authors generalize the Heron problem even further, relax its convexity 
assumptions, study its theoretical properties, and pursue subgradient 
QQ algorithms for solving the convex case. Here, we revisit the origi- 

t^ nal problem solely from the numerical perspective. By exploiting the 

>— : majorization-minimization (MM) principle of computational statistics 

"^ and rudimentary techniques from differential calculus, we are able to 

(^ construct a very fast algorithm for solving the Euclidean version of 

CN the generalized Heron problem. 






> 
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1 Introduction. 

In a recent article in this journal, Mordukhovich et al. [22] presented the 
following generalization of the classical Heron problem. Given a collection of 
closed convex sets {Ci, . . . , Ck} in M.'^, find a point x in the closed convex set 
S <ZM.'^ such that the sum of the Euclidean distances from x to Ci through 
Ck is minimal. In other words, 

k 

minimize -D(x) := \^ (i(x, Ci) subject to x G S", (1) 



where (i(x, Q) = mf{||x — y|| : y G fl}. 

A rich history of special cases motivates this problem formulation. When 
k = 2, Ci and C2 are singletons, and 5* is a line, we recover the problem orig- 
inally posed by the ancient mathematician Heron of Alexandria. The special 
case where k = 3; Ci, C2, and C3 are singletons; and 5 = M^ was suggested 
by Fermat nearly 400 years ago and solved by Torricelli [13] . In his Doctrine 
and Application of Fluxions, Simpson generalized the distances to weighted 
distances. In the 19th century, Steiner made several fundamental contribu- 
tions, and his name is sometimes attached to the problem [9l [11]. At the 
turn of the 20th century, the German economist Weber generalized Fermat 's 
problem to an arbitrary number of singleton sets Cj. Weiszfeld published 
the first iterative algorithrqj for solving the Fermat- Weber problem in 1937 
[28| |29] . In the modern era, the Fermat- Weber problem has enjoyed a re- 
naissance in various computational guises. Both the problem and associated 
algorithms serve as the starting point for many advanced models in location 
theory ^EiW\- 

The connections between celebrated problems such as the Fermat- Weber 
problem and the generalized Heron problem were noted earlier by Mor- 
dukhovich et al. [23]. In subsequent papers [211 ES], they generalize the 
Heron problem further to arbitrary closed sets, Ci, . . . ,Ck and S* in a Ba- 
nach space. Readers are referred to their papers for a clear treatment of how 
one solves these abstract versions of the generalized Heron problem with 
state-of-the-art tools from variational analysis. 

Here we restrict our attention to the special case of Euclidean distances 
presented by Mordukhovich et al. [23]. Our purpose is take a second look at 
this simple yet most pertinent version of the problem from the perspective 
of algorithm design. Mordukhovich et al. [2T1 [221 [23] present an iterative 
subgradient algorithm for numerically solving problem ([I]) and its general- 
izations, a robust choice when one desires to assume nothing beyond the 
convexity of the objective function. Indeed, the subgradient algorithm works 
if the Euclidean norm is exchanged for an arbitrary norm. However, it is nat- 
ural to wonder if there might be better alternatives for the finite-dimensional 
version of the problem with Euclidean distances. Here we present one that 
generalizes Weiszfeld's algorithm by invoking the majorization-minimization 
(MM) principle from computational statistics. Although the new algorithm 
displays the same kind of singularities that plagued Weiszfeld's algorithm 

^Kuhn [15] points out that Weiszfeld's algorithm has been rediscovered several times. 



the dilemmas can be resolved by slightly perturbing problem (jTl), which 
we refer to as the generalized Heron problem for the remainder of this article. 
In the limit, one recovers the solution to the unperturbed problem. As might 
be expected, it pays to exploit special structure in a problem. The new MM 
algorithm is vastly superior to the subgradient algorithms in computational 
speed for Euclidean distances. 

Solving a perturbed version of the problem by the MM principle yields 
extra dividends as well. The convergence of MM algorithms on smooth prob- 
lems is well understood theoretically. This fact enables us to show that solu- 
tions to the original problem can be characterized without appealing to the 
full machinery of convex analysis dealing with non-differentiable functions 
and their subgradients. Although this body of mathematical knowledge is 
definitely worth learning, it is remarkable how much progress can be made 
with simple tools. The good news is that we demonstrate that crafting an 
iterative numerical solver for problem ([I| is well within the scope of classical 
differential calculus. Our resolution can be understood by undergraduate 
mathematics majors. 

As a brief summary of things to come, we begin by recalling background 
material on the MM principle and convex analysis of different iable functions. 
This is followed with a derivation of the MM algorithm for problem ([I]) 
and consideration of a few relevant numerical examples. We end by proving 
convergence of the algorithm and characterizing solution points. 

2 The MM Principle. 

Although first articulated by the numerical analysts Ortega and Rheinboldt 
|24j, the MM principle currently enjoys its greatest vogue in computational 
statistics P, [17] . The basic idea is to convert a hard optimization problem 
(for example, non-differentiable) into a sequence of simpler ones (for exam- 
ple, smooth). The MM principle requires majorizing the objective function 
/(y) by a surrogate function g{y \ x) anchored at the current point x. Ma- 
jorization is a combination of the tangency condition (yf(x | x) = /(x) and 
the domination condition 5'(y | x) > /(y) for all y G M"^. The associated 
MM algorithm is defined by the iterates 

Xfe+i := argmin5f(y | x^). (2) 

yes 



Because 

/(xfe+i) < 5'(xfc+i I Xfc) < 5f(xfc I Xfc) = /(xfc), (3) 

the MM iterates generate a descent algorithm driving the objective function 
downhilL Constraint satisfaction is enforced in finding :sik+i- Under appro- 
priate regularity conditions, an MM algorithm is guaranteed to converge to 
a local minimum of the original problem [16] . 

3 Background on Convex Analysis. 

As a prelude to deriving an MM algorithm, we review some basic facts from 
convex analysis in the limited context of differentiable functions. Deeper 
treatments can be found in the references [3l HI [121 EH], |26] . Recall that a 
differentiable function /(y) is convex if and only if its domain S is convex 
and 

/(y)>/(x) + (V/(x),y-x), (4) 

for all X, y G S. Provided /(x) is twice differentiable, it is convex when 
its second differential d'^f{x.) is positive semidefinite for all x and strictly 
convex when d'^f{x) is positive definite for all x. These characterizations 
are a direct consequence of executing a second-order Taylor expansion of 
/(y) and applying the supporting hyperplane inequality Q. The supporting 
hyperplane inequality Q also leads to a succinct necessary and sufficient 
condition for a global minimum. A point x G S* is a global minimizer of /(y) 
on S if and only if 

(V/(x),y-x)>0 (5) 

for all y G S*. Intuitively speaking, every direction pointing into S must lead 
uphill. 

We conclude this section by reviewing projection operators [16]. Denote 
the projection of x onto a set fi C M'' by Pq(x). By definition Po(x) satisfies 

Pni'x) := argmin||x-y||. 
yen 

If r2 is a closed convex set in M'^, then -Pq(x) exists and is unique. Further- 
more, the projection operator is non-expansive in the sense that 

||Pa(x)-Pn(y)||<||x-y|| 



for all X, y G M'^. Non-expansion clearly entails continuity. Explicit formulas 
for the projection operator -Pn(x) exist when fi is a box, Euclidean ball, 
hyperplane, or halfspace. Fast algorithms for computing -Pq(x) exist for 
the unit simplex, the ii ball, and the cone of positive semidefinite matrices 

The projection operator and the distance function are intimately related 
through the gradient identity V(i(x, C)^ = 2[x — Pci^x)]. A standard proof 
of this fact can be found in reference \i2i P- 181]. If (i(x, C)^ > 0, then the 
chain rule gives 

V(i(x,C) = VVci(x,C)2 ""-^^W 



rf(x, C) 



On the interior of C, it is obvious that V(i(x, C) = 0. In contrast, differen- 
tiability of (i(x, C) at boundary points of C is not guaranteed. 

4 An MM Algorithm for the Heron Problem. 

Since it adds little additional overhead, we recast problem (fTl) in the Simpson 
form 

k 

minimize -D(x) := 2^7jC^(x, Cj) subject to x G S* (6) 

j=i 

involving a convex combination of the distances (i(x, Cj) with positive weights 
7j as suggested in [23] . We first derive an MM algorithm for solving problem 
^ when S* n Cj = for all i. This exercise will set the stage for attacking 
the more general case where S intersects one or more of the Cj. In practice 
quadratic majorization is desirable because it promotes exact solution of the 
minimization step of the MM algorithm. It takes two successive majorizations 
to achieve quadratic majorization in our setting. The first is the simple 
majorization 

rf(x,Q)<||x-Pc:.(x™)|| 

flowing directly from the definition of the distance function. The second is 
the majorization 

Vu < v^ + „ , [U - Um), (7) 



of the concave function a/u on the interval (0, oo). The combination of these 
two majorizations yields the quadratic majorization 



■i(x.C.) < ||x,,. - Pc.(x..)|| H- II'' ' ^°-':""'' ' ''''r - f'^-''^"'"' . (8) 



Summing these majorizations over i leads to quadratic majorization of -D(x) 
and ultimately to the MM algorithm map 



ip{x) = argmin< - V" 



^,||z-Pc,(x)|'' 



with weights Wi = 7j||x — Pc^(x)|| ^. When the Ci are singletons and S = M^, 
the map ^(x) implements Weiszfeld's algorithm for solving the Fermat- Weber 
problem p8ll29] . 

The quadratic majorization of -D(x) just derived can be rewritten as 



k . 2 



5'ix I x^ 



where 



+ c, 



Wi 

ai 



and c is a constant that does not depend on x. Thus, the MM update boils 
down to projection onto S" of a convex combination of the projections of the 
previous iterate onto the sets Cf, in symbols 



'-m+l 



Ps\Y,a,PcA^m)]. (9) 



The majorization (pj) involves dividing by when x^ belongs to Cj. This 
singularity also bedevils Weiszfeld's algorithm. Fortunately, perturbation 
of the objective function salvages the situation. One simply replaces the 
function D(x.) by the related function 



A(x) = ^7,V^(x,Q)2 + e 



for e small and positive. Ben-Tal and Teboulle [2] cover further examples 
of this perturbation strategy. In any case observe that the smooth function 
fe{u) = y/u"^ + e has derivatives 



and is therefore strictly increasing and strictly convex on the interval [0, oo). 
Hence, the function D^{x.) is also convex. Because a/m^ + e — ^/e is a good 
approximation to m > 0, the solutions of the two problems should be close. 
In fact, we will show later that the minimum point of -De(x) tends to the 
minimum point of -D(x) as e tends to 0. In the presence of multiple minima, 
this claim must be rephrased in terms of cluster points. 

The majorization d{'x,Cj) < ||x — Pcji^m)\\ around the current iterate 
Xm yields the majorization 



'di^,C,y + e < y^llx-Pc.lxJP + e. 
Application of the majorization ([T]) implies the further majorization 

n.^ ^ l^^ ||x-Pc^.(x^)f 

where c is an irrelevant constant. The corresponding MM update x^+i is 
identical to the previous MM update ^ except for one difference. The 
weights Wi are now defined by the benign formula 

li 

Wi - 



v/||x^-Pc,(x^)P + e 
involving no singularity. 

5 Examples. 

We now consider four examples illustrating the performance of the MM algo- 
rithm and framing our expectations for convergence. The subgradient algo- 
rithm [22j serves as a benchmark for comparison throughout. This algorithm 
relies on the updates 



Xm+l 



1=1 



where 



if X™ G C,;, 



and the nonnegative constants 7]^ satisfy Yl^=i Vm 
The weights 7, equal 1 in all examples except the last. 



00 and ^~^i vi<oo. 



Iteration 


Xi 


1 


0.00000000000000 


2 


-0.93546738305698 


3 


-0.92881282698649 


4 


-0.92645373003448 


5 


-0.92567602259658 


6 


-0.92542515217106 


7 


-0.92534495711879 


8 


-0.92531944712805 


9 


-0.92531135783449 


10 


-0.92530879826106 


20 


-0.92530761702316 


30 


-0.92530761701184 


50 


-0.92530761701184 



X2 



X3 



2.00000000000000 
1.66164748416805 
1.63915389878166 
1.63220797263449 
1.63004821970935 
1.62937435413374 
1.62916364685109 
1.62909766226627 
1.62907697582185 
1.62907048520349 
1.62906751412014 
1.62906751409212 
1.62906751409212 



0.00000000000000 
0.10207032020482 
0.08424264751830 
0.08007815377225 
0.07911751670489 
0.07889815178685 
0.07884864943702 
0.07883765997470 
0.07883527888603 
0.07883478238381 
0.07883466748783 
0.07883466748878 
0.07883466748878 



Table 1: Cubes and ball example in M : MM Algorithm 



5.1 Five Cubes and a Ball in M^. 

Our first example is taken from the reference |22]. This three-dimensional 
example involves five cubes Ci with side lengths equal to 2 and centers 
(0,-4,0), (-4, 2, -3), (-3, -4, 2), (-5,4,4), and (-1,8,1). The set ^ is a 
ball with center (0, 2, 0) and radius 1. Iteration commences at the point 
xi = (0, 2, 0) G S" and takes subgradient steps with 77^ = 1/m. Table [I] 
shows the MM iterates with e = 0. Convergence to machine precision oc- 
curs within 30 iterations. In contrast Table |2] shows that parameter values 
{xi,X2,X3) are still changing after 10^ subgradient iterates. For brevity we 
omit a second example of four squares and a disk in M^ from the same source 
|22j . In this example the superiority of the MM algorithm over the subgra- 
dient algorithm is equally evident. 
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Iteration 



Xi 



X2 



X3 



1 0.00000000000000 

10 -0.92583298353433 

100 -0.92531325048300 

1000 -0.92530767419684 

10000 -0.92530761758555 

100000 -0.92530761701755 

1000000 -0.92530761701233 

1500000 -0.92530761701231 

2000000 -0.92530761701229 



2.00000000000000 
1.63051788239768 
1.62908232435160 
1.62906766065418 
1.62906751554109 
1.62906751410641 
1.62906751409334 
1.62906751409328 
1.62906751409324 



0.00000000000000 

0.07947484741743 
0.07883822912883 
0.07883468589312 
0.07883466757273 
0.07883466748904 
0.07883466748881 
0.07883466748881 
0.07883466748881 



Table 2: Cubes and ball example in M.^: Subgradient Algorithm. 



5.2 The Closest Point to Three Disks in M^ 

This example from the reference |2T] illustrates the advantage of minimiz- 
ing a sequence of approximating functions D^^(x.). The sets Ci are three 
unit balls in M^ centered at (0,2), (2,0), and (-2,0). The set S equals M^. 
The minimum distance occurs at (0, 1) as can be easily verified by checking 
the optimality conditions spelled out in Proposition 4.3 in [2T\. Figure [I] 
displays the iteration paths for 50 different starting values (dots) and their 
corresponding fixed point (the square). Along the mth leg of the path we set 
em to be max{10~™, 10~^^}. The solution to the current problem is taken as 
the initial point for the next problem. All solution paths initially converge 
to a point just below (0,1) and then march collectively upwards to (0,1). 
The passage of the MM iterates through the unit balls is facilitated by our 
strategy of systematically reducing e. Table [3] shows the subgradient and 
MM iterates starting from the point (5,7). 



5.3 Three ColUnear Disks in R^. 

Here we illustrate the behavior of the MM algorithm when there is more than 
one solution. Consider two unit balls in M^ centered at (2,0), and (—2,0), 
and take S to be the unit ball centered at the origin. There is a continuum 
of solutions extending along the line segment from (—1,0) to (1,0), as can 
be verified by the optimality conditions provided by Theorem 3.2 in [22] ■ 
Figure [2] shows the iteration paths for 100 different initial values (dots) and 
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Figure 1: Finding the closest point to three disks in 



Subgradient Algorithm 


MM Algorithm 


Iteration 


Xi 


X2 


Iteration 


Xi 


X2 


10 


0.7092649 


1.2369866 


10 


0.2674080 


0.7570688 


100 


0.0558764 


0.9973310 


100 


0.0000000 


0.7249706 


1,000 


0.0046862 


0.9993844 


1,000 


0.0000000 


0.9998002 


10,000 


0.0003955 


0.9999274 


1,800 


0.0000000 


0.9999999 


100,000 


0.0000334 


0.9999957 


1,850 


0.0000000 


1.0000000 


1,000,000 


0.0000028 


0.9999998 


1,900 


0.0000000 


1.0000000 



Table 3: Three disks example in M^ starting from (5,7). 
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Figure 2: An example with a continuum of solutions. 



Subgradient Algorithm 


MM Algorithm 


Iteration xi X2 


Iteration xi xi 


10,000 0.9997648 0.0000223 

100,000 0.9997648 0.0000040 

1,000,000 0.9997648 0.0000007 


10 0.9941149 0.0001308 
20 0.9941149 0.0000000 
30 0.9941149 0.0000000 



Table 4: Three coUinear disks example in M^ starting from (1.5, 0.25) 

their corresponding fixed points (squares). In this example we take e = 0. 
Although the iterates are not guaranteed to converge and may in principle 
cycle among multiple cluster points, this behavior is not observed in practice. 
The iterates simply converge to different fixed points depending on where 
they start. Table |4] compares the iterations for the subgradient method and 
the MM algorithm starting from the point (1.5,0.25). The two algorithms 
converge to different solution points but at drastically different rates. 

5.4 Kuhn's Problem. 



Our last example was originally concocted by Kuhn [T4j to illustrate how 
Weiszfeld's algorithm can stall when its iterates enter one of the sets Cj. 
Although this event rarely occurs in practice, characterizing the initial con- 
ditions under which it happens has been a subject of intense scrutiny [5l EJ [H 
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Figure 3: A problem where Weiszfeld's algorithm fails to converge. 



IHl [15]. The occasional failure of Weiszfeld's algorithm prompted Vardi and 
Zhang [27] to redesign it. Their version preserves the descent property but 
differs substantially from ours. In any event the example shown in Figure [3] 
involves two points with weights 7j proportional to 5 placed at (59,0) and 
(20,0) and two more points with weights proportional to 13 placed at (-20, 48) 
and (-20, -48). The optimal point is the origin. Starting at (44,0), Weiszfeld's 
algorithm stalls at (20,0) after one iteration. Our MM iterates (dots) with e 
decreasing from 0.1 to 0, in contrast, move across (20.0) and correctly con- 
verge to (0,0) to within machine precision in 99 steps. Table [s] compares the 
progress achieved by the MM and subgradient methods. Note that when e is 
0.1, the MM algorithm overshoots the true answer and then comes back to 
(0, 0) after setting e to be 0. The subgradient algorithm makes solid progress 
early but subsequently slows down on this almost smooth problem. 



12 



Subgradient Algorithm 






MM Algorith] 


m 


Iteration xi X2 


Iterat 


ion 


Xi 


X2 


10 8.6984831 0.0000000 




10 


1.9448925 


0.0000000 


1,000 1.2966354 0.0000000 




30 


-0.0011998 


0.0000000 


100,000 0.1845171 0.0000000 




60 


-0.0012011 


0.0000000 


10,000,000 0.0259854 0.0000000 




90 


0.0000000 


0.0000000 



Table 5: Kulin's problem 

6 Convergence Theory. 

Before embarking on a proof of convergence, it is prudent to discuss whether 
a minimum point exists and is unique. Recall that a continuous function 
attains its minimum on a compact set. Thus, problem ^ possesses a min- 
imum whenever S is bounded. If S is unbounded, then one can substitute 
boundedness of one or more of the sets Cj. In this circumstance -D(x) is 
coercive in the sense that lim|jx||-s.oo -D(x) = oo. As pointed out in Proposi- 
tion 3.1 of the reference |22|, coerciveness is sufficient to guarantee existence. 
Because -D(x) < D^{x.), the perturbed criterion -De(x) is coercive whenever 
the original criterion -D(x) is coercive. Henceforth, we will assume that S or 
at least one of the Q is bounded. 

A strictly convex function possesses at most one minimum point on a con- 
vex set. The function |x| shows that this sufficient condition for uniqueness 
is hardly necessary. In the Fermat- Weber problem, where the closed convex 
sets Ci = {xj} are singletons, the function -D(x) is strictly convex if and only 
if the points Xj are non-coUinear. To generalize this result, we require the 
sets Ci to be non-collinear. Geometrically this says that it is impossible to 
draw a straight line that passes through all of the Cj. Non-coUinearity can 
only be achieved when k > 2 and n^^^Cj = 0. We also require the Ci to be 
strictly convex. A set C is said to be strictly convex if the interior of the line 
segment [x, y] connecting two different points x and y of C lies in the interior 
of C. Put another way, the boundary of C can contain no line segments. A 
singleton or a closed ball is strictly convex, but a closed box is not. 

Proposition 6.1. // the closed convex sets Ci, . . . ,Ck are strictly convex but 
not collinear, then -D(x) is strictly convex. 

Proof. Suppose the contrary is true, and choose x 7^ y and a strictly between 
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and 1 so that 

D[ax + (1 - a)y] = aD{x) + (1 - a)D{y). (10) 

Let L be the hne {sx + (1 — s)y : s G M} passing through the points x and 
y. Then there exists at least one Cj such that L Ci Cj = 0. In particular, x, 



y, and ax + (1 — a)y all fall outside this Cj. Equality (10) implies that 



a||x-Pc,(x)|| + (l-a)||y-Pc,(y)|| 
= ||ax+ (l-a)y- Pc'Jax+ (1 -a)y] II 
< ||ax + (1 - a)y - «Pc,(x) - (1 - «)Pc,(y)|| 
<a||x-Pc,(x)|| + (l-a)||y-Pc,(y)||. 

Since the projection of a point onto Cj is unique, these sandwich inequalities 
entail 

Pcjax + (1 - a)y] = aPc,(x) + (1 - «)Pc,(y). 

If Pc i'^) 7^ Pc.(y), then the strict convexity of Cj implies the convex com- 
bination aPc-(x) + (1 — a)Pc.(y) is interior to Cj. Hence, this point cannot 
be the closest point to the external point ax+ (1 — a)y. Therefore, consider 
the possibility Pcji'x) = Pcjiy) = z. Equality can occur in the inequality 

II ax + (1 — a)y — z\\ <a||x — z|| + (l — a)||y — z|| 

only when x — z = t(y — z) for some t ^ 1. This relation shows that 

1 t 
z = X y 

1-t 1-f^ 

belongs to LnCj, contradicting our hypothesis. Thus, -D(x) is strictly convex. 

D 

The next result shows that the function -De(x) inherits strict convexity 
from -D(x). Therefore, when -D(x) is strictly convex, -De(x) possesses a unique 
minimum point. 

Proposition 6.2. If D{x) is strictly convex, then -De(x) is also strictly con- 
vex. 
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Proof. Fix arbitrary x 7^ y and a strictly between and 1. The strict 
convexity of -D(x) implies that there is at least one j such that 

(i(a;x + (1 — a;)y, Cj) < ad{-x, Cj) + (1 — ot)d{y, Cj). 

The strict inequality 



rf(ax + (1 - a)y, C^f + e < W [arf(x, Cj) + (1 - a)rf(y, Cj)f + e, 



< « Jd(x, C,)2 + e + (1 - a)Jd{y, Cjf + e 



follows because the function fe{u) = y/u"^ + e is a strictly increasing and 
convex. Summing over j gives the desired result. D 

We now clarify the relationship between the minima of the D^{x.) and 
D{x.) functions. 

Proposition 6.3. For a sequence of constants e^ tending to 0, let y^ be 

a corresponding sequence minimizing D^^{x). If y is the unique minimum 
point of D{-x), then y^ tends to y. If D(x.) has multiple minima, then every 
cluster point of the sequence y^ minimizes -D(x). 

Proof. To prove the assertion, consider the inequalities 

DiyJ < D,,^{yJ < D„„(x) < Di(x) 

for any :si & S and e.^ < 1. Taking limits along the appropriate subse- 
quences proves that the cluster points of the sequence y^ minimize -D(x). 
Convergence to a unique minimum point y occurs provided the sequence y^ 
is bounded. If S is bounded, then y^ is bounded by definition. On the 
other hand, if any Cj is bounded, then -D(x) is coercive, and the inequality 
-D(y„i) < -Di(x) forces y.^ to be bounded. D 

The convergence theory of MM algorithms hinges on the properties of 
the algorithm map ip{x.) = argmiuy (yf(y | x). For easy reference, we state a 
simple version of Meyer's monotone convergence theorem [19] instrumental 
in proving convergence in our setting. 

Proposition 6.4. Let /(x) be a continuous function on a domain S andip^x) 
be a continuous algorithm map from S into S satisfying /('?/'(x)) < /(x) for 
all :si E S with ^(x) 7^ x. Suppose for some initial point xq that the set 
/^y(xo) = {x G S* : /(x) < /(xq)} is compact. Then (a) all cluster points are 
fixed points of ip(x.), and (b) \inirn^oo\\'^m+i — ^m\\ = 0. 

15 



Note that Proposition 6.4 also ensures the existence of at least one cluster 
point for the sequence of iterates x^+i = ^(xm). Additionally, the conver- 
gence of the MM iterates (|9| to a stationary point of /(x) follows immediately 
provided the fixed points of ?/'(x) are stationary points of /(x) and ipi^) pos- 
sesses only finitely many fixed points. 



Let us verify the conditions of Proposition 6.4 for minimizing D^{x.). The 
function -De(x) is continuous on its domain S, and the set ^^^(xo) is compact 
for any initial point xq since either S is compact or -De(x) is coercive. The 
continuity of the algorithm map follows immediately from the continuity of 
the projection mapping. Finally, we need to prove that D^{il){x.)) < -De(x) 
whenever x 7^ '^/'(x). First observe that ipi^.) = x if and only if the MM 
surrogate function satisfies g^i^x \ x) = mmy g^{y \ x). Since g^iy \ x) has 
a unique minimizer, we have the strict inequality g^{ilj{x.) \ x.) < g^{'x \ x) 
whenever x is not a fixed point of ijj. This forces a decrease in the objective 
function -De(x) and makes the MM algorithm strictly monotone outside the 
set of stationary points. 

We now argue that the fixed points of the algorithm map ipix) are sta- 
tionary points of -De(x). We will show, in fact, that the two sets of points 
coincide. To accomplish this, we need to determine the gradients of -De(x) 
and g^ix. \ y). Recall that fe{u) is strictly increasing and strictly convex. As 
a consequence the functions /e(||x||) and /e[(i(x, C^)] are convex. Even more 
remarkable is the fact that both functions are continuously different iable. 
When X 7^ 0, the function ||x|| is differentiable. Likewise, when x ^ Cj, the 
function (i(x, Cj) is differentiable. Therefore, the chain rule implies 

V/.(l|x||) = ^iL^ = -=4^ (11) 



V¥¥T~eM yq 



x||2 + e 



respectively. 

By continuity one expects the gradients to be defined for x = and 
X G Cj by the corresponding limit of 0. In the former case the expansion 



II 119 1 II 112 / II 112 

-^ Z -^ \\ ^ / "V" 

V||x||2 + e-v^ = v^-^/l + ^-v^ = 2T7r + ^'' 
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shows that V/e(||0||) = 0. In the latter case the expansion 



di.y,C,y + i 






rf(y,C, 



and the bound (i(y, Cj) = |(i(y, Cj) — (i(x, Cj)\ < ||y — x|| for x G Cj hkewise 



show that V/e[rf(x,Cj)] = 0. Consequently, equations (11) and (12) hold for 
all X G M'^. It follows that both -De(x) and g^{'x \ y) are differentiable on M'^, 
with gradients 

Y7n^ ^ V" x-Pc,(x) 

V/^.(x) = ^7,^=====, 



and 



V^.(x I y) 



k 



^c,(y) 



v/rf(y,C,)2 + e' 



(13) 



respectively. Note that y G S* minimizes -De(x) over S* if and only if 

(y-^c,(y),x-y) 



^^^■" yrf5^ 



>o, 



+ e 



for all X G 5. This inequality, however, is equivalent to the inequality 
(Vf7e(y I y),x — y) > 0, for all :>i. E S, which in turn holds if and only 
if y is a fixed point of ^(x). If -D(x) is strictly convex, then -De(x) has a 
unique minimum point, and ^(x) has exactly one fixed point. 



Thus, Proposition 6.3 and Proposition 6.4 together tell us that y is a 



solution to ^ if there is a sequence of e^ tending to zero and a sequence of 
points y^ tending to y that satisfy 



PcAyr 






<o, 



(14) 



for all X G iS. The above sufficient condition becomes necessary as well if 
D{x.) is strictly convex. As a sanity check, when the sets S fl Cj are all 
empty and the weights 7j are identical, we recover the characterization of 
the optimal points given in Theorem 3.2 of reference [22], albeit under the 
more restrictive assumption of strict convexity. 
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7 Conclusion 

There is admittedly an art to applying the MM principle. The majorization 
presented here is specific to Euclidean distances, and changing the underlying 
norm would require radical revision. Nonetheless, when the MM principle ap- 
plies, the corresponding MM algorithm can be effective, simple to code, and 
intuitively appealing. Here the principle lit the way to an efficient numerical 
algorithm for solving the Euclidean version of the generalized Heron problem 
using only elementary principles of smooth convex analysis. We also sug- 
gested a simple yet accurate approximation of the problem that removes the 
singularities of the MM algorithm and Weiszfeld's earlier algorithm. Similar 
advantages accrue across a broad spectrum of optimization problems. The 
ability of MM algorithms to handle high-dimensional problems in imaging, 
genomics, statistics, and a host of other fields testifies to the potency of a 
simple idea consistently invoked. Mathematical scientists are well advised to 
be on the lookout for new applications. 
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